Use this space to include your installation screenshots.
Detail the code you used to create, initialize, and push your portfolio repo to GitHub. This will be helpful as you will need to repeat many of these steps to update your portfolio throughout the course.
$ mkdir MICB425_portfolio #make portfolio directory within desired directory
$ cd MICB425_portfolio #go to new directory
$ git init #designate it as a repo
$ touch ID.txt #create blank ID.txt file
$ git add . #stage all files in new repo for commit
$ git commit -m "First commit" #commit files
$ git remote add origin https://github.com/ryankn/MICB425_portfolio #designate remote repo URL
$ git remove -v #verify remote repo URL
$ git push -u origin master #push local repo to remote repo
The following is from the activity of recreating the example PDF, with the header levels changed such that they won’t appear in the table of contents.
The following assignment is an exercise for the reproduction of this .html document using the RStudio and RMarkdown tools we’ve shown you in class. Hopefully by the end of this, you won’t feel at all the way this poor PhD student does. We’re here to help, and when it comes to R, the internet is a really valuable resource. This open-source program has all kinds of tutorials online.
http://phdcomics.com/ Comic posted 1-17-2018
The goal of this R Markdown html challenge is to give you an opportunity to play with a bunch of different RMarkdown formatting. Consider it a chance to flex your RMarkdown muscles. Your goal is to write your own RMarkdown that rebuilds this html document as close to the original as possible. So, yes, this means you get to copy my irreverant tone exactly in your own Markdowns. It’s a little window into my psyche. Enjoy =)
hint: go to the PhD Comics website to see if you can find the image above
If you can’t find that exact image, just find a comparable image from the PhD Comics website and include it in your markdown
Let’s be honest, this header is a little arbitrary. But show me that you can reproduce headers with different levels please. This is a level 3 header, for your reference (you can most easily tell this from the table of contents).
Another header, now with maths
Perhaps you’re already really confused by the whole markdown thing. Maybe you’re so confused that you’ve forgotton how to add. Never fear! A calculator R is here:
1231521+12341556280987
## [1] 1.234156e+13
Or maybe, after you’ve added those numbers, you feel like it’s about time for a table!
I’m going to leave all the guts of the coding here so you can see how libraries (R packages) are loaded into R (more on that later). It’s not terribly pretty, but it hints at how R works and how you will use it in the future. The summary function used below is a nice data exploration function that you may use in the future.
library(knitr)
kable(summary(cars),caption="I made this table with kable in the knitr package library")
| speed | dist | |
|---|---|---|
| Min. : 4.0 | Min. : 2.00 | |
| 1st Qu.:12.0 | 1st Qu.: 26.00 | |
| Median :15.0 | Median : 36.00 | |
| Mean :15.4 | Mean : 42.98 | |
| 3rd Qu.:19.0 | 3rd Qu.: 56.00 | |
| Max. :25.0 | Max. :120.00 |
And now you’ve almost finished your first RMarkdown! Feeling excited? We are! In fact, we’re so excited that maybe we need a big finale eh? Here’s ours! Include a fun gif of your choice!
R code from work for Data Science Friday on 26 Jan 18.
#Libraries
#install.packages("tidyverse")
library("tidyverse")
## -- Attaching packages ------------------------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 2.2.1 v purrr 0.2.4
## v tibble 1.4.2 v dplyr 0.7.4
## v tidyr 0.8.0 v stringr 1.2.0
## v readr 1.1.1 v forcats 0.2.0
## -- Conflicts ---------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
#Data Import
metadata <- read.table(file="DS_Friday/26Jan18/Saanich.metadata.txt", header=TRUE, row.names=1, sep="\t", na.strings="NAN")
#Exercise 1
OTU <- read.table(file="DS_Friday/26Jan18/Saanich.OTU.txt", header=TRUE, row.names=1, sep="\t", na.strings="NAN")
#Exercise 2
metadata %>% rownames_to_column('sample') %>%
filter(CH4_nM >= 100 & Temperature_C <= 10) %>%
column_to_rownames('sample') %>%
select(Depth_m,CH4_nM,Temperature_C)
## Depth_m CH4_nM Temperature_C
## SI072_S3_185 185 310.068 9.091
## SI072_S3_200 200 774.034 9.117
newtable <-
metadata %>% rownames_to_column('sample') %>%
select(matches("nM|sample")) %>%
mutate(N2O_uM = N2O_nM/1000, Std_N2O_uM = Std_N2O_nM/1000, CH4_uM = CH4_nM/1000, Std_CH4_uM = Std_CH4_nM/1000) %>%
column_to_rownames('sample')
#Exercise 3
metadata %>% rownames_to_column('sample') %>%
select(matches("nM|sample")) %>%
mutate(N2O_uM = N2O_nM/1000, Std_N2O_uM = Std_N2O_nM/1000, CH4_uM = CH4_nM/1000, Std_CH4_uM = Std_CH4_nM/1000) %>%
column_to_rownames('sample')
## N2O_nM Std_N2O_nM CH4_nM Std_CH4_nM N2O_uM Std_N2O_uM
## SI072_S3_010 0.849 0.114 1030.478 3.070 0.000849 0.000114
## SI072_S3_020 13.199 0.000 29.012 0.000 0.013199 0.000000
## SI072_S3_040 12.829 1.509 37.146 2.695 0.012829 0.001509
## SI072_S3_060 12.306 0.524 36.501 3.521 0.012306 0.000524
## SI072_S3_075 13.896 1.417 24.013 0.435 0.013896 0.001417
## SI072_S3_085 12.959 0.955 7.376 0.029 0.012959 0.000955
## SI072_S3_090 15.551 1.417 4.190 0.159 0.015551 0.001417
## SI072_S3_097 18.682 1.628 3.991 0.759 0.018682 0.001628
## SI072_S3_100 18.087 1.275 3.231 0.392 0.018087 0.001275
## SI072_S3_110 15.843 1.953 3.633 0.127 0.015843 0.001953
## SI072_S3_120 16.304 1.085 3.463 0.519 0.016304 0.001085
## SI072_S3_135 12.909 2.577 4.815 0.658 0.012909 0.002577
## SI072_S3_150 11.815 0.000 8.323 0.000 0.011815 0.000000
## SI072_S3_165 6.310 0.732 23.831 2.291 0.006310 0.000732
## SI072_S3_185 0.000 0.000 310.068 0.000 0.000000 0.000000
## SI072_S3_200 0.000 0.000 774.034 12.745 0.000000 0.000000
## CH4_uM Std_CH4_uM
## SI072_S3_010 1.030478 0.003070
## SI072_S3_020 0.029012 0.000000
## SI072_S3_040 0.037146 0.002695
## SI072_S3_060 0.036501 0.003521
## SI072_S3_075 0.024013 0.000435
## SI072_S3_085 0.007376 0.000029
## SI072_S3_090 0.004190 0.000159
## SI072_S3_097 0.003991 0.000759
## SI072_S3_100 0.003231 0.000392
## SI072_S3_110 0.003633 0.000127
## SI072_S3_120 0.003463 0.000519
## SI072_S3_135 0.004815 0.000658
## SI072_S3_150 0.008323 0.000000
## SI072_S3_165 0.023831 0.002291
## SI072_S3_185 0.310068 0.000000
## SI072_S3_200 0.774034 0.012745
R code for Data Science Friday assignment due Friday 16 Feb 18.
#Package Installation
#install.packages("tidyverse")
#source("https://bioconductor.org/biocLite.R")
#biocLite("phyloseq")
#Libraries
library("tidyverse")
library("phyloseq")
#Data Import
new_OTUs <-
read.table("DS_Friday/Assignment20180208/Saanich.OTU.new.txt",
header = TRUE, sep = "\t", row.names = 1, na.strings = "NAN")
new_metadata <-
read.table("DS_Friday/Assignment20180208/Saanich.metadata.new.txt",
header = TRUE, sep = "\t", row.names = 1, na.strings = "NAN")
load("DS_Friday/Assignment20180208/phyloseq_object.RData")
#Exercise 1
ggplot(new_metadata, aes(x = CH4_nM, y = Depth_m)) +
geom_point(color = "purple", shape = 17)
#Exercise 2
new_metadata %>%
mutate(Temperature_F = Temperature_C * 9 / 5 + 32) %>%
ggplot(aes(x = Temperature_F, y = Depth_m)) +
geom_point()
#Exercise 3
physeq_percent = transform_sample_counts(physeq, function(x) 100 * x/sum(x))
plot_bar(physeq_percent, fill="Domain") +
geom_bar(aes(fill=Domain), stat="identity") +
labs(x = "Sample depth", y = "Relative abundance (%)", title = "Domains from 10 to 200 m in Saanich Inlet")
#Exercise 4
new_metadata %>%
select(matches("uM|depth"),-matches("Std"),-H2S_uM) %>%
gather(key = "Nutrient", value = "Concentration", -Depth_m) %>%
ggplot(., aes(x = Depth_m, y = Concentration)) +
geom_point() +
geom_line() +
facet_wrap( ~ Nutrient, scales = "free") +
theme(legend.position = "none") +
labs(x = "Depth (m)", y = expression(paste("Concentration (", mu, "M)")))
The first thing for any assignment should link(s) to any relevant literature (which should be included as full citations in a module references section below).
Describe the numerical abundance of microbial life in relation to ecology and biogeochemistry of Earth systems.
What were the main questions being asked?
What is the total number of prokaryotes and the total amount of their cellular carbon, nitrogen, and phosphorus on Earth? How are prokaryotes divided up among various large habitats on Earth? What is the turnover time of prokaryotes in these habitats? What effect does this have on prokaryotic genetic diversity?
What were the primary methodological approaches used?
To make calculation of such figures more plausible, the number of prokaryotes in three large habitats in which current knowledge suggests most prokaryotes reside in were examined, namely: aquatic environments, soil, and the subsurface. All numbers were used from previously published papers reporting various figures like CFU/mL counts, volume estimations, or C content.
Summarize the main results or findings.
| Environment | No. of prokaryotic cells, x 1028 | Pg of C in prokaryotes |
|---|---|---|
| Aquatic habitats | 12 | 2.2 |
| Oceanic subsurface | 355 | 303 |
| Soil | 26 | 26 |
| Terrestrial subsurface | 25-250 | 22-215 |
| Total | 415-640 | 353-546 |
The amount of prokaryotic C is roughly 60-100% of the amount in plants, and that of prokaryotic N and P is likely an entire order of magnitude or two larger than that in plants, around 85–130 Pg and 9–14 Pg for N and P, respectively.
| Habitat | No. of prokaryotic cells | Turnover time, days | Cells/yr x 1029 |
|---|---|---|---|
| Marine heterotrophs - above 200 m | 3.6 x 1028 | 16 | 8.2 |
| Marine heterotrophs - below 200 m | 8.2 x 1028 | 300 | 1.1 |
| Marine autotrophs | 2.9 x 1027 | 1.5 | 7.1 |
| Soil | 2.6 x 1029 | 900 | 1.0 |
| Subsurface | 4.9 x 1030 | 5.5 x 105 | 0.03 |
| Domestic mammals | 4.3 x 1024 | 1 | 0.02 |
As a result of the large number of cells being produced on a regular basis and combined with their per-gene mutation rate of 4 x 10-7, prokaryotes are capable of incredible genetic diversity.
Do new questions arise from the results?
How is the turnover time for the subsurface community so long? At that kind of turnover rate, can that still be constituted as life? Are definitions of prokaryotic species compatible with the knowledge of the rapid mutation/evolution rate of prokaryotes?
Were there any specific challenges or advantages in understanding the paper (e.g. did the authors provide sufficient background information to understand experimental logic, were methods explained adequately, were any specific assumptions made, were conclusions justified based on the evidence, were the figures or tables useful and easy to understand)?
A myriad of assumptions had to be made, not all of which completely made sense. As one example in relation to the subsurface, the number derived for the number of prokaryotic cells was derived by extrapolating from data collected from the shallower depths of the subsurface, and thus may not be entirely accurate. Additionally, as someone who does not regularly work with numbers of such magnitude, discerning the order of magnitude of fg or Pg masses was challenging.
Describe the numerical abundance of microbial life in relation to the ecology and biogeochemistry of Earth systems.
| Habitat | Abundance |
|---|---|
| Aquatic | 1.161 x 1029 |
| Soil | 2.556 x 1929 |
| Subsurface | 3.8 x 1030 |
4x104 cells/mL divided by 5x105 cells/mL = 8%
What is the difference between an autotroph, heterotroph, and a lithotroph based on information provided in the text?
autotrophs fix inorganic carbon e.g. CO2 into biomass, heterotrophs assimilate organic carbon, lithotrophs consume inorganic substrates
Based on information provided in the text and your knowledge of geography what is the deepest habitat capable of supporting prokaryotic life? What is the primary limiting factor at this depth?
subsurface deep habitats, both terrestrial and marine terrestrial and marine: up to 4 km, limiting factor is temperature of 125 degrees C temperature changes about 22 C per km
Mariana trench - how deep is it? 10.9 km
mount Everest - 8.8 km
is anything really alive up in the atmosphere at 77 km? that doesn’t seem likely - lack of nutrients or moisture, then there’s lots of UV radiation too, sketchy. Let’s say 20 km.
Based on estimates of prokaryotic habitat limitation, what is the vertical distance of the Earth’s biosphere measured in km?
Thus the vertical distance is about 24 km from top to bottom (tip of mount Everest to 4-5 km under Mariana trench)
How was annual cellular production of prokaryotes described in Table 7 column four determined? (Provide an example of the calculation)
Annual cellular production of prokaryotes was calculated based on literature values for population size and population turnover time in days. In the following example calculation, population size is P, turnover time is T, and annual cellular production is A.
\[A=P*\frac{365}{T}\] 3.6x1028 cells * 365 days / 16 turnovers = 8.2x1029 cells/year
0.72*4 = 2.88 Pg C per year
51 Pg C per year of productivity * 85% = 43 Pg C per year goes to upper 200m 43/2.88 = 14.9 turnovers a year 365/14.9 = 24.5 days per turnover
why does this vary with depth? different production and consumption of C in different habitats
Carbon assimilation efficiency and carbon content determine turnover rates in the upper 200m of the ocean. The amount of net primary productivity required to sustain prokaryotic turnover is dependent on both C assimilation efficiency and total carbon content of the population, which then sets an upper limit on turnover rates. These vary between habitats because different assimilation efficiencies and total carbon content, as well as the amount of total net primary productivity each habitat zone consumes.
also viruses - the viruses kill bugs causing turnover, and carry assessory metabolic genes that when they infect cells, supplement the various metabolic capacities of the community
4x10-7 mutations/generation
(4x10-7)4 = 2.56 x 10-26 mutations/generation
365/16 = 22.8 turnovers per year
3.1 x 1028 cells * 22.8 = 8.2x1029 cells/year
8.2x1029 cells/year x 2.56 x 10-26 mutations/generation = 2.1x104 mutations/year
convert to hours - divide by 365x24
2.1x104 / 365 / 24 = 2.4 mutations/hour
1/2.4 = 0.4 hours/mutation
Given the large population size and high mutation rate of prokaryotic cells, what are the implications with respect to genetic diversity and adaptive potential? Are point mutations the only way in which microbial genomes diversify and adapt?
As a result of their large population size and high mutation rate, prokaryotes are able to very rapidly adapt to a niche, allowing for great genetic diversity. In addition to this, point mutations aren’t the only way microbial genomes can adapt - horizontal gene transfer as a result of a variety of causes is a major driving force behind prokaryotic genome diversification as well.
What relationships can be inferred between prokaryotic abundance, diversity, and metabolic potential based on the information provided in the text? The sheer abundance of prokaryotic life lends itself to great genetic diversity, and this diversity too leads to wide-ranging metabolic capabilities.
Comment on the emergence of microbial life and the evolution of Earth systems.
Indicate the key events in the evolution of Earth systems at each approximate moment in the time series. If times need to be adjusted or added to the timeline to fully account for the development of Earth systems, please do so.
1.3 billion years ago
200,000 years ago First humans
Describe the dominant physical and chemical characteristics of Earth systems at the following waypoints:
Precambrian
Phanerozoic
Increased oxygenation of the atmosphere, mass extinction events from meteorite impacts
Discuss the role of microbial diversity and formation of coupled metabolism in driving global biogeochemical cycles.
Biogeochemical processes - H, C, N, O, S, and P fluxes
abiotic chemical processes tend to be based on acid/base reactions while biotic ones are based on redox. Reactions are nested, with abiotic processes providing e- acceptors that the biotic reactions use, as well as C, S, and P via tectonics, volcanism and weathering(?)
Why is Earth’s redox state considered an emergent property? Emergent property of microbial life on earth
How do reversible electron transfer reactions give rise to element and nutrient cycles at different ecological scales? What strategies do microbes use to overcome thermodynamic barriers to reversible electron flow?
Synergistic multi-species assemblage of the overall pathway
Using information provided in the text, describe how the nitrogen cycle partitions between different redox “niches” and microbial groups. Is there a relationship between the nitrogen cycle and climate change?
NH4 -> NO2 is a niche, NO2 -> NO3 is a niche; nitrification, typically involves CO2 fixation to organic matter NH4 + NO2 -> N2 (anammox) N2 -> NH4 reduction (N fixation) NO2 or NO3 -> NO -> N2O -> N2 (denitrification) and N2O can be released (greenhouse gas) NO3 -> NO2 -> NH4 (Dissimilatory nitrate reduction to ammonium, DNRA)
What is the relationship between microbial diversity and metabolic diversity and how does this relate to the discovery of new protein families from microbial community genomes?
On what basis do the authors consider microbes the guardians of metabolism?
On one of the 3 papers for debate - Rockstrom, evidence for anthropocene, safe operating space for humanity
Prompt: *“Microbial life can easily live without us; we, however, cannot survive without the global catalysis and environmental transformations it provides.” Do you agree or disagree with this statement? Answer the question using specific reference to your reading, discussions and content from evidence worksheets and problem sets.“*
The first organisms on Earth were microbes, and as a result, many of the events dramatically impacting the Earth’s development occurred as a direct consequence of microbial processes (1). Earth is only the way that it is today because of microbes, and they have sculpted the global biosphere in their image. Only in recent geological history with the appearance of humans and, more specifically, the emergence of human industrial technology, has the ecological balance of power been shifted away from microbial life (2-4). However, while humanity likely possesses or will someday possess the theoretical capacity to supplant the role of microbes in the biosphere, it is a foolish endeavour, as the earth was shaped by microbes for the sake of microbes, and a sizeable disruption in the way of things will likely have catastrophic effects across the global ecosystem, microbial or not. This is not to say that microbes are saint-like “guardians of metabolism”, as Falkowski et al. put it (5); microbial competition and natural selection prove to be somewhat of a double-edged sword with dramatic consequences if the delicate balance between microbial cooperation and antagonism is perturbed. Additionally, humanity’s ability to exploit microbial processes for human or global benefit as the slow march of technological progress continues will only increase, increasingly blurring the line between what is a distinctly microbial activity and what is human-driven, making such distinctions less and less relevant going forth. Consequently, while the prompt taken literally is true, its implication that microbes are controlling the direction of the global ecosystem is not. To invoke another set of metaphors for describing the complex relationship of humanity to the microbial realm, microbes are the biogeochemical “engines” of Spaceship Earth, of which humans are at the helm.
To call microbes intimately implicated in global biogeochemical cycling would be an understatement. Indeed, microbial processes underpin the global cycling of the 6 most important elements essential for life as we know it on Earth – Hydrogen, Carbon, Nitrogen, Oxygen, Phosphorus, and Sulfur (5-6). Even with the plethora of macroscopic multicellular terrestrial photosynthetic organisms that exist, unicellular marine microorganisms are responsible for the vast majority of the primary productivity occurring globally (6). Even non-microbial photosynthesis is arguably dependent on microbes on a different temporal scale, as chloroplasts were once a free-living photosynthetic organism that entered, perhaps unwillingly, into an endosymbiotic relationship with the common ancestor to all modern eukaryotic photosynthetic organisms (6). And speaking of plants, fixation of atmospheric dinitrogen to a form of nitrogen usable by plants such as nitrate within a given local ecosystem is often orchestrated by Rhizobia (7). Without these microbes present, the local ecosystem will very likely collapse without exogenous nitrate supplementation in the form of fertilizer. As powerful as this example is, however, it also demonstrates the fact that humans are also capable of manipulating a system to fit their needs if nature does not – modern agricultural processes are unable to rely on the comparatively limited amounts of fixed nitrogen these bacteria can produce, and must instead turn to industrial chemistry, namely the Haber-Bosch process, to produce sufficient fixed nitrogen for plant growth. Almost half of all the fixed nitrogen produced on Earth is anthropogenic (2), clearly illustrating that the role of humans in the biosphere is certainly not insignificant. Humans can take an otherwise biological process and scale it up industrially to levels which would be unfathomable for a microbial system. Additionally, this would take place on a human timescale rather than on a geological or an ecological one; where it took microbes millions if not billions of years in order to cause a substantial change in the atmosphere, humanity has only been around for several thousand years and already we can see the effects of humanity’s existence, best exemplified by the global change in climate caused in no small part by greenhouse gas production as a result of human activity (3). Though this effect is certifiably negative, it highlights the extraordinary capacity of human actions to affect the biosphere on a timescale which is too rapid for microbes to even approach. Natural selection and competition do not enable microbial life to adapt and evolve as rapidly as human technological capabilities grow, meaning that while microbes are presently responsible for massive proportions of nutrient cycling, this may not necessarily remain to be the case in the future, and should any changes in the present equilibrium be required, microbes will more likely than not evolve too slowly to adapt to the new conditions, and a global catastrophe will occur should humanity not intervene.
Related to this idea, entirely ordinary microbial processes can often have overtly detrimental impacts due to excessive metabolic changes in response to environmental perturbations. Evidence for this is clearly visible from Earth’s geological history, which is marred by several mass glaciations (8). Of particular importance for this discussion is the Huronian glaciation, which extended from about 2.4 billion years ago to 2.1 Ga, making it the oldest glaciation in Earth’s history (8). Temporally, the Huronian glaciation closely followed the aptly-named Great Oxygenation event, which was the rapid appearance of substantial amounts of dioxygen gas in the atmosphere caused by the emergence of the evolutionary precursor to the modern-day Cyanobacteria (6, 8). As the surviving representatives are today, these early Cyanobacteria were photosynthetic, using the at-the-time untapped energy of the sun to power carbon fixation and thus their own growth, producing dioxygen as a by-product (6). However, much unlike it is today, that early atmosphere was largely anaerobic, dominated in large part by methane from methanogenesis (6). The rising levels of dioxygen would then react with methane, oxidizing it to CO2, greatly reducing its concentration and its potency as a greenhouse gas, leading to runaway planetary cooling due to photosynthesis and thus the eventual glaciation event (6). Additionally, dioxygen is toxic to obligate anaerobic organisms which were likely prolific at the time (6), implicating Cyanobacteria as the cause of one of the largest mass extinctions of Earth’s history. While the oxygen-rich atmosphere created by the Great Oxidation Event was necessary for the eventual evolution of complex multicellular lifeforms such as humans, at the time, it was undoubtedly also the cause of a rapid catastrophic change in the biosphere, the effects of which humans feel to this day whenever any individual so much as takes a breath. Less dramatic examples of the detrimental effects of runaway microbial growth include algal blooms associated with eutrophication, which often leads to mass die-offs in the local area. As microbes are purely concerned about their own individual survival, a sudden evolutionary leap or a rapid change in growth conditions favouring one species over another can serve to cause rapid environmental damage and consequences, unable to be reversed by the slow pace of evolution. Indeed, natural selection is the cause of the damage – the dominant organism crowds out the rest, no matter how beneficial the others may be to the survival of the whole or the maintenance of the present equilibrium.
Additionally, the delineation between the microbial and the human realms is unnecessary if not borderline ridiculous. Since the first endosymbiotic events where once free-living Cyanobacteria or Proteobacteria were engulfed into a larger unicellular organism for mutual benefit (6), the lives of these burgeoning eukaryotes have always been crucially intertwined with the microbial world. Many symbiotic relationships connecting complex multicellular life to unicellular microorganisms have been identified just in the last two decades, some more akin to this foundational endosymbiosis than others. As alluded to earlier, Rhizobia species form nodules in the roots of some plant species and carry out biological nitrogen fixation to the benefit of the plant, losing its ability to be a free-living organism in the process (7). Certain types of squid such as the bobtail squid develop an organ deliberately to incentivize Vibrio fischeri colonization, a Proteobacterium which produces light to prevent the squid from casting a shadow and making it observable to predators (9). And of course, what discussion of the relationship of microorganisms with multicellular eukaryotic life would be complete without mentioning the resident human gut microbiota, where the number of microbial cells is about the same as the number of human cells in the human’s very own body (10). Beyond this, the recent emergence of biotechnology, which is based entirely on manipulating and exploiting microbial processes for human industrial application, further blurs the line between the microbial and the human. If it is a microbe producing the biofuel in the human-created bioreactor under the conditions set deliberately by a human to get the microbe to best produce said biofuel, does that make it a microbial activity or a human one? Perhaps drawing a distinction between the two may have mattered early in Earth’s history where complex inter-species relationships were in their infancy, but that certainly is no longer the case. Microbes are now intimately involved in most processes occurring on the earth whether they be metabolic or biogeochemical, and humanity is increasingly discovering ways of manipulating these same microbial processes for its own gain. Treating the two as separate entities can only be considered ignorant in today’s day and age.
I have demonstrated thus far that microbes, while incredibly capable in very many regards, have their inequities due to being simple biological creatures driven solely by evolution and a drive for self-preservation. Though humanity is saddled with this same evolutionary baggage without the same inherent metabolic capabilities that microbes may possess, humans have the capacity for scientific innovation, vast industrial capabilities rivalling that of the microbial world, and the ability to deliberately choose their own direction independent of an evolutionary timescale, making them the main drivers of change in the biosphere. However, it is the horse that moves both itself and the rider atop it, and it is only on the metaphorical backs of the microbes around us that humanity can progress onwards. As such, I agree with the prompt taken literally, but would argue that its implication that the microbial world is responsible for setting the course of the global ecosystem is untrue. While microbes are the biogeochemical “engines” that power Spaceship Earth, it is humans that are at the helm.
Utilize this space to include a bibliography of any literature you want associated with this module. We recommend keeping this as the final header under each module.
Whitman WB, Coleman DC, Wiebe WJ. 1998. Prokaryotes: The unseen majority. Proc Natl Acad Sci U S A 95:6578-6583. PMC33863
Nisbet EG, Sleep NH. 2003. The habitat and nature of early life. Nature 409:1083-1091.
Canfield DE, Glazer AE, Falkowski PG. 2010. The Evolution and Future of Earth’s Nitrogen Cycle. Science 330:192-196.
Rockström J, Steffen W, Noone K, Persson Å, Chapin FS, Lambin EF, Lenton TM, Scheffer M, Folke C, Schellnhuber HJ, Nykvist B, de Wit CA, Hughes T, van der Leeuw S, Rodhe H, Sörlin S, Snyder PK, Costanza R, Svedin U, Falkenmark M, Karlberg L, Corell RW, Fabry VJ, Hansen J, Walker B, Liverman D, Richardson K, Crutzen P, Foley JA. 2009. A safe operating space for humanity. Nature 461:472-475.
Schrag DP. 2012. Geobiology of the Anthropocene, p 425-436. In Knoll AH, Canfield DE, Konhauser KO (ed), Fundamentals of Geobiology, 1st ed. Blackwell Publishing Ltd, Oxford, United Kingdom.
Falkowski PG, Fenchel T, Delong EF. 2008. The Microbial Engines That Drive Earth’s Biogeochemical Cycles. Science 320:1034-1039.
Kasting JF, Siefert JL. 2002. Life and the Evolution of Earth’s Atmosphere. Science 296:1066-1068.
Specific emphasis should be placed on the process used to find the answer. Be as comprehensive as possible e.g. provide URLs for web sources, literature citations, etc.
(Reminders for how to format links, etc in RMarkdown are in the RMarkdown Cheat Sheets)
How many prokaryotic divisions have been described and how many have no cultured representatives (microbial dark matter)? In 1997, 33 divisions were described with at least 10 without any cultivated representatives (Pace et al. 1997). In 2016, ~89 bacterial phyla and ~20 archaeal phyla via small 16S rRNA databases. But there could be up to 1500 bacterial phyla as there are microbes that live in the “shadow biosphere”. As of 2003, about half of the 52 identified major phyla had cultivated representatives, probably far more now.
How many metagenome sequencing projects are currently available in the public domain and what types of environments are they sourced from? How many? a lot - hundreds of thousands, 110217 on EBI database (not all projects are in public repos) Sediments, soil, gut, aquatic, esp those where it’s hard to culture communities in lab settings
What types of on-line resources are available for warehousing and/or analyzing environmental sequence information (provide names, URLS and applications)?
Shotgun Metagenomics Assembly - EULER Binning - S-GCOM Annotation - KEGG Analysis Pipelines - Megan 5 Warehousing - IMG/M, MG-RAST, NLB (NCBI), EBI
Marker Gene Metagenomics Standalone Software - OTUbase Analysis Pipelines - SILVA Denoising - AmpliconNoise Databases - Ribosomal Database Project (RDP)
SILVA and RDP are gold standards
Phylogenetic - vertical gene transfer - carry phylogenetic information allowing tree reconstruction - taxonomic - ideally single-copy
Functional - more horizontal gene transfer - identify specific biogeochemical functions associated with measurable effects - not as useful for phylogenetic reconstruction
Ricks and Opportunities in binning: Risks: - incomplete coverage of genome sequence (working with partial data) - contamination from different sequences
In class Day 1:
Assignment:
In class Day 2:
Obtain a collection of “microbial” cells from “seawater”. The cells were concentrated from different depth intervals by a marine microbiologist travelling along the Line-P transect in the northeast subarctic Pacific Ocean off the coast of Vancouver Island British Columbia.
Sort out and identify different microbial “species” based on shared properties or traits. Record your data in this Rmarkdown using the example data as a guide.
Once you have defined your binning criteria, separate the cells using the sampling bags provided. These operational taxonomic units (OTUs) will be considered separate “species”. This problem set is based on content available at What is Biodiversity.
For example, load in the packages you will use.
#To make tables
library(kableExtra)
library(knitr)
#To manipulate and plot data
library(tidyverse)
#For alpha-diversity calculations
library(vegan)
Then load in the data.
candydata <-
read.table(file = "CandyData.csv", header = TRUE, sep = ",", na.strings = "NA")
For your community:
candydata %>%
kable("html") %>%
kable_styling(bootstrap_options = "striped", font_size = 10, full_width = F)
| name | characteristics | population | sample1 | sample2 | sample3 | sample4 |
|---|---|---|---|---|---|---|
| rigoa_maroon | maroon rigoa wine gum | 2 | 0 | 1 | 0 | 0 |
| rigoa_yellow | yellow rigoa wine gum | 1 | 1 | 0 | 0 | 0 |
| rigoa_transluc | transluc rigoa wine gum | 2 | 1 | 1 | 0 | 0 |
| rigoa_orange | orange rigoa wine gum | 1 | 0 | 1 | 0 | 0 |
| rigoa_brown | brown rigoa wine gum | 1 | 0 | 1 | 0 | 0 |
| lego_blusqr | blue 2x2 lego | 1 | 1 | 1 | 0 | 0 |
| lego_blurect | blue 1x2 lego | 3 | 0 | 0 | 1 | 1 |
| lego_grnrect | green 1x2 lego | 2 | 1 | 0 | 0 | 1 |
| lego_yelsqr | yellow 2x2 lego | 1 | 0 | 0 | 0 | 0 |
| lego_yelrect | yellow 1x2 lego | 4 | 0 | 0 | 0 | 1 |
| lego_pinksqr | pink 2x2 lego | 1 | 0 | 0 | 0 | 0 |
| lego_pinkrect | pink 1x2 lego | 6 | 0 | 0 | 0 | 2 |
| drop_yellow | yellow spherical gumdrop | 4 | 0 | 1 | 1 | 1 |
| drop_orange | orange spherical gumdrop | 5 | 4 | 0 | 0 | 1 |
| drop_red | red spherical gumdrop | 7 | 3 | 2 | 0 | 2 |
| drop_green | green spherical gumdrop | 5 | 1 | 1 | 0 | 2 |
| drop_brown | brown spherical gumdrop | 3 | 0 | 2 | 0 | 0 |
| bear_green | green gummy bear | 17 | 5 | 3 | 3 | 4 |
| bear_orange | orange gummy bear | 15 | 4 | 7 | 1 | 3 |
| bear_red | red gummy bear | 11 | 1 | 3 | 1 | 1 |
| bear_transluc | translucent gummy bear | 14 | 5 | 2 | 1 | 3 |
| bear_yellow | yellow gummy bear | 17 | 2 | 5 | 2 | 5 |
| bear_pink | pink gummy bear | 17 | 1 | 9 | 1 | 1 |
| skit_green | green skittle | 45 | 8 | 8 | 1 | 3 |
| skit_red | red skittle | 43 | 6 | 10 | 6 | 3 |
| skit_brown | brown skittle | 37 | 1 | 5 | 9 | 6 |
| skit_yellow | yellow skittle | 35 | 4 | 7 | 4 | 11 |
| skit_orange | orange skittle | 37 | 8 | 5 | 5 | 8 |
| mnm_green | green M and M | 24 | 3 | 5 | 5 | 2 |
| mnm_red | red M and M | 28 | 11 | 5 | 6 | 4 |
| mnm_blue | blue M and M | 39 | 8 | 9 | 7 | 5 |
| mnm_brown | brown M and M | 30 | 3 | 7 | 7 | 4 |
| mnm_yellow | yellow M and M | 32 | 3 | 9 | 8 | 5 |
| mnm_orange | orange M and M | 65 | 13 | 16 | 6 | 9 |
| mike_red | red Mike and Ikes | 41 | 8 | 5 | 4 | 15 |
| mike_green | green Mike and Ikes | 39 | 6 | 8 | 6 | 8 |
| mike_pink | pink Mike and Ikes | 44 | 6 | 10 | 7 | 7 |
| mike_yellow | yellow Mike and Ikes | 45 | 4 | 6 | 3 | 7 |
| mike_orange | orange Mike and Ikes | 30 | 3 | 6 | 6 | 8 |
| macrophage | sugar-covered octopus-shaped multi-coloured gummy | 6 | 2 | 1 | 0 | 0 |
| swirl_blue | blue sugar-covered swirl | 2 | 2 | 0 | 0 | 0 |
| swirl_red | red sugar-covered swirl | 1 | 0 | 0 | 0 | 0 |
| watermelon | red white green fruit-shaped | 1 | 1 | 0 | 0 | 0 |
| kisses | foil-wrapped chocolate | 16 | 5 | 7 | 0 | 3 |
| snake | thin long red candy | 13 | 3 | 3 | 0 | 4 |
| bottle | two-colour sugar-covered gummy | 3 | 0 | 2 | 1 | 0 |
| hock | translucent oval | 1 | 0 | 1 | 0 | 0 |
| port | translucent diamond | 1 | 0 | 1 | 0 | 0 |
| fish | red green fish | 1 | 0 | 0 | 0 | 0 |
To help answer the questions raised in Part 1, you will conduct a simple but informative analysis that is a standard practice in biodiversity surveys. This analysis involves constructing a collector’s curve that plots the cumulative number of species observed along the y-axis and the cumulative number of individuals classified along the x-axis. This curve is an increasing function with a slope that will decrease as more individuals are classified and as fewer species remain to be identified. If sampling stops while the curve is still rapidly increasing then this indicates that sampling is incomplete and many species remain undetected. Alternatively, if the slope of the curve reaches zero (flattens out), sampling is likely more than adequate.
To construct the curve for your samples, choose a cell within the collection at random. This will be your first data point, such that X = 1 and Y = 1. Next, move consistently in any direction to a new cell and record whether it is different from the first. In this step X = 2, but Y may remain 1 or change to 2 if the individual represents a new species. Repeat this process until you have proceeded through all cells in your collection.
# format data for vegan and for summation
candydata_comm =
candydata %>%
select(-characteristics) %>%
gather(key = "sample", value = "occurences", -name) %>%
spread(name, occurences) %>%
column_to_rownames(var = "sample")
# determine total number of counts for each sample
candydata_counts = rowSums(candydata_comm, na.rm = TRUE, dims = 1)
# sample numerically randomly but consistently without replacement from each sample until sample completely depleted
set.seed(1)
randomsample_sample3 = sample(1:candydata_counts[4], candydata_counts[4], replace=FALSE)
# cumulative sum
cum_sample3 =
candydata %>%
select(sample3) %>%
filter(sample3 > 0) %>%
cumsum()
# initialize each category as not-visited
cum_sample3$init = rep(0)
# initialize collection curve data to be plotted
collectcurve_sample3 <-
data.frame(x = 1:candydata_counts[4],
y = rep(0))
# initialize rownum variable
rownum = 0
for(i in 1:length(randomsample_sample3)) {
rownum =
which(cum_sample3 >= randomsample_sample3[i],
arr.ind = TRUE) %>%
first() #determine associated row number of the random value
if (cum_sample3$init[rownum] == 0 && i > 1) {
cum_sample3$init[rownum] = 1 # indicate visited, increment running count
collectcurve_sample3$y[i] = collectcurve_sample3$y[i-1]+1
} else if (cum_sample3$init[rownum] == 1 && i > 1) {
collectcurve_sample3$y[i] = collectcurve_sample3$y[i-1] # do not increment count
} else {
collectcurve_sample3$y[i] = 1 #if i = 1 then set count to 1
}
}
For your sample:
ggplot(collectcurve_sample3, aes(x=x, y=y)) +
geom_point() +
geom_smooth() +
labs(x="Cumulative number of individuals classified", y="Cumulative number of species observed")
## `geom_smooth()` using method = 'loess'
Using the table from Part 1, calculate species diversity using the following indices or metrics.
\(\frac{1}{D}\) where \(D = \sum p_i^2\)
\(p_i\) = the fractional abundance of the \(i^{th}\) species
The higher the value is, the greater the diversity. The maximum value is the number of species in the sample, which occurs when all species contain an equal number of individuals. Because the index reflects the number of species present (richness) and the relative proportions of each species with a community (evenness), this metric is a diveristy metric. Consider that a community can have the same number of species (equal richness) but manifest a skewed distribution in the proportion of each species (unequal evenness), which would result in different diversity values.
simpsoncalc <- transmute(candydata,
popdiv = (population/candydata_counts[1])^2,
sam1div = (sample1/candydata_counts[2])^2,
sam2div = (sample2/candydata_counts[3])^2,
sam3div = (sample3/candydata_counts[4])^2,
sam4div = (sample4/candydata_counts[5])^2)
simpsons <- 1/colSums(simpsoncalc, na.rm = TRUE, dims = 1)
simpsons[4]
## sam3div
## 17.81507
simpsons[1]
## popdiv
## 23.62611
Another way to calculate diversity is to estimate the number of species that are present in a sample based on the empirical data to give an upper boundary of the richness of a sample. Here, we use the Chao1 richness estimator.
\(S_{chao1} = S_{obs} + \frac{a^2}{2b})\)
\(S_{obs}\) = total number of species observed a = species observed once b = species observed twice or more
(sum(candydata$sample3 >= 1)) + (sum(candydata$sample3 == 1)^2)/(2*sum(candydata$sample3 >= 2))
## [1] 26.88235
(sum(candydata$population >= 1)) + (sum(candydata$population == 1)^2)/(2*sum(candydata$population >= 2))
## [1] 50.59211
We’ve been doing the above calculations by hand, which is a very good exercise to aid in understanding the math behind these estimates. Not surprisingly, these same calculations can be done with R functions. Since we just have a species table, we will use the vegan package.
We can calculate the Simpson Reciprocal Index using the diversity function.
And we can calculate the Chao1 richness estimator (and others by default) with the the specpool function for extrapolated species richness. This function rounds to the nearest whole number so the value will be slightly different that what you’ve calculated above.
In Project 1, you will also see functions for calculating alpha-diversity in the phyloseq package since we will be working with data in that form.
For your sample:
diversity(candydata_comm, index="invsimpson")
## population sample1 sample2 sample3 sample4
## 23.62611 21.94009 23.86441 17.81507 20.41667
specpool(candydata_comm, pool = c("population", "sample1", "sample2", "sample3", "sample4"))
## Species chao chao.se jack1 jack1.se jack2 boot boot.se n
## population 49 49 0 49 0 49 49 0 1
## sample1 34 34 0 34 0 34 34 0 1
## sample2 37 37 0 37 0 37 37 0 1
## sample3 25 25 0 25 0 25 25 0 1
## sample4 32 32 0 32 0 32 32 0 1
These values roughly match my previous calculations.
If you are stuck on some of these final questions, reading the Kunin et al. 2010 and Lundin et al. 2012 papers may provide helpful insights.